HyCA: A Hybrid Computing Architecture for Fault-Tolerant Deep Learning
نویسندگان
چکیده
Hardware faults on the regular 2-D computing array of a typical deep learning accelerator (DLA) can lead to dramatic prediction accuracy loss. Prior redundancy design approaches typically have each homogeneous redundant processing element (PE) mitigate faulty PEs for limited region rather than entire avoid excessive hardware overhead. However, they fail recover when number in any exceeds same region. The mismatch problem deteriorates fault injection rate rises and are unevenly distributed. To address problem, we propose hybrid architecture (HyCA) fault-tolerant DLAs. It has set dot-production units (DPPUs) recompute all operations that mapped despite PE locations. According our experiments, HyCA shows significantly higher reliability, scalability, performance with less chip area penalty compared conventional approaches. Moreover, by taking advantage flexible recomputing, also be utilized scan detect effectively at runtime.
منابع مشابه
A Hybrid Fault-Tolerant Architecture for Highly Reliable Processing Cores
Increasing vulnerability of transistors and interconnects due to scaling is continuously challenging the reliability of future microprocessors. Lifetime reliability is gaining attention over performance as a design factor even for lower-end commodity applications. In this work we present a low-power hybrid fault tolerant architecture for reliability improvement of pipelined microprocessors by p...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملA Microprocessor-Based Hybrid Duplex Fault-Tolerant System
Reliability is one of the fundamental considerations in the design of industrial control equipment. The microprocessor-based Hybrid Duplex fault-tolerant System (HDS) proposed in this paper has high reliability to meet this demand although its hardware structure is simple. The hardware configuration of HDS and the fault tolerance of this system are described. The switching control strategies in...
متن کاملHybrid Log-based Fault Tolerant Scheme for Mobile Computing System
Many new characteristics are introduced in the mobile computing system, such as mobility, disconnections, finite power source, vulnerable to physical damage, lack of stable storage. Since the related log-based rollback recovery fault tolerant schemes may still lead to dramatic performance loss in failure-free or inconsistent recovery caused by the fault, a hybrid log-based fault tolerant scheme...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
سال: 2022
ISSN: ['1937-4151', '0278-0070']
DOI: https://doi.org/10.1109/tcad.2021.3124763